Goto

Collaborating Authors

 optimal combination


A Proofs As mentioned in sections 3 and 4, our dataset D contains the random perturbation vectors ξ and side information

Neural Information Processing Systems

B.1 Loss function Mathematically, the conditional total variation loss function (11) can be explicitly written as: L The joint loss minimization task is performed using the following network architecture which has 2 parallel networks training simultaneously. The decoder is a mirrored version of the encoder. Here, given the time series nature of the data, we follow the rolling window approach for network training. This is shown in algorithm 2. Once the In this section, we discuss the data generation process for the simulated data used in section 5.1. The data is generated using [Page Jr, 1984].


Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

Neural Information Processing Systems

Large language models have demonstrated remarkable capabilities but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar optimization, EO). Despite their shared objective, these have evolved rather independently, with IO receiving more research attention recently. This paper seeks to bridge this gap by comprehensively comparing the performance of representative IO and EO techniques both isolation and combination on a diverse set of challenging tasks. Our findings reveal that intelligently reusing model-generated input-output pairs obtained from evaluating prompts on the validation set as exemplars, consistently improves performance on top of IO methods but is currently under-investigated. We also find that despite the recent focus on IO, how we select exemplars can outweigh how we optimize instructions, with EO strategies as simple as random search outperforming state-of-the-art IO methods with seed instructions without any optimization. Moreover, we observe a synergy between EO and IO, with optimal combinations surpassing the individual contributions. We conclude that studying exemplar optimization both as a standalone method and its optimal combination with instruction optimization remain a crucial aspect of APO and deserve greater consideration in future research, even in the era of highly capable instruction-following models.


Counting Still Counts: Understanding Neural Complex Query Answering Through Query Relaxation

Brunink, Yannick, Daza, Daniel, He, Yunjie, Cochez, Michael

arXiv.org Artificial Intelligence

Neural methods for Complex Query Answering (CQA) over knowledge graphs (KGs) are widely believed to learn patterns that generalize beyond explicit graph structure, allowing them to infer answers that are unreachable through symbolic query processing. In this work, we critically examine this assumption through a systematic analysis comparing neural CQA models with an alternative, training-free query relaxation strategy that retrieves possible answers by relaxing query constraints and counting resulting paths. Across multiple datasets and query structures, we find several cases where neural and relaxation-based approaches perform similarly, with no neural model consistently outperforming the latter. Moreover, a similarity analysis reveals that their retrieved answers exhibit little overlap, and that combining their outputs consistently improves performance. These results call for a re-evaluation of progress in neural query answering: despite their complexity, current models fail to subsume the reasoning patterns captured by query relaxation. Our findings highlight the importance of stronger non-neural baselines and suggest that future neural approaches could benefit from incorporating principles of query relaxation.


A Proofs As mentioned in sections 3 and 4, our dataset D contains the random perturbation vectors ξ and side information

Neural Information Processing Systems

B.1 Loss function Mathematically, the conditional total variation loss function (11) can be explicitly written as: L The joint loss minimization task is performed using the following network architecture which has 2 parallel networks training simultaneously. The decoder is a mirrored version of the encoder. Here, given the time series nature of the data, we follow the rolling window approach for network training. This is shown in algorithm 2. Once the In this section, we discuss the data generation process for the simulated data used in section 5.1. The data is generated using [Page Jr, 1984].


The Role of Active Learning in Modern Machine Learning

Werner, Thorben, Schmidt-Thieme, Lars, Yalavarthi, Vijaya Krishna

arXiv.org Artificial Intelligence

Even though Active Learning (AL) is widely studied, it is rarely applied in contexts outside its own scientific literature. We posit that the reason for this is AL's high computational cost coupled with the comparatively small lifts it is typically able to generate in scenarios with few labeled points. In this work we study the impact of different methods to combat this low data scenario, namely data augmentation (DA), semi-supervised learning (SSL) and AL. We find that AL is by far the least efficient method of solving the low data problem, generating a lift of only 1-4\% over random sampling, while DA and SSL methods can generate up to 60\% lift in combination with random sampling. However, when AL is combined with strong DA and SSL techniques, it surprisingly is still able to provide improvements. Based on these results, we frame AL not as a method to combat missing labels, but as the final building block to squeeze the last bits of performance out of data after appropriate DA and SSL methods as been applied.


Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

Neural Information Processing Systems

Large language models have demonstrated remarkable capabilities but their performance is heavily reliant on effective prompt engineering. Automatic prompt optimization (APO) methods are designed to automate this and can be broadly categorized into those targeting instructions (instruction optimization, IO) vs. those targeting exemplars (exemplar optimization, EO). Despite their shared objective, these have evolved rather independently, with IO receiving more research attention recently. This paper seeks to bridge this gap by comprehensively comparing the performance of representative IO and EO techniques both isolation and combination on a diverse set of challenging tasks. Our findings reveal that intelligently reusing model-generated input-output pairs obtained from evaluating prompts on the validation set as exemplars, consistently improves performance on top of IO methods but is currently under-investigated.


Reviews: Regret Bounds for Online Portfolio Selection with a Cardinality Constraint

Neural Information Processing Systems

Summary The paper studies the online portfolio selection problem under cardinality constraints and provides two algorithms that achieve sublinear regret. One algorithm handles the full information setting and the other algorithm handles the bandit feedback setting. Furthermore, the paper provides lower bounds for both the full information and bandit feedback settings. The approach that both algorithms take is to split the problem into two learning problems. One problem is to learn the optimal combination of assets and the other problem is to learn the optimal portfolio. To learn the optimal combination of assets a version of either the multiplicative weights algorithm (full information) or exp3 (bandit feedback) is used.


ImProver: Agent-Based Automated Proof Optimization

Ahuja, Riyaz, Avigad, Jeremy, Tetali, Prasad, Welleck, Sean

arXiv.org Artificial Intelligence

Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs to proof optimization falls short, and we incorporate various improvements into ImProver, such as the use of symbolic Lean context in a novel Chain-of-States technique, as well as error-correction and retrieval. We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems, finding that ImProver is capable of rewriting proofs so that they are substantially shorter, more modular, and more readable.


Model Cascading for Code: Reducing Inference Costs with Model Cascading for LLM Based Code Generation

Chen, Boyuan, Zhu, Mingzhi, Dolan-Gavitt, Brendan, Shafique, Muhammad, Garg, Siddharth

arXiv.org Artificial Intelligence

The rapid development of large language models (LLMs) has led to significant advancements in code completion tasks. While larger models have higher accuracy, they also cost much more to run. Meanwhile, model cascading has been proven effective to conserve computational resources while enhancing accuracy in LLMs on natural language generation tasks. It generates output with the smallest model in a set, and only queries the larger models when it fails to meet predefined quality criteria. However, this strategy has not been used in code completion tasks, primarily because assessing the quality of code completions differs substantially from assessing natural language, where the former relies heavily on the functional correctness. To address this, we propose letting each model generate and execute a set of test cases for their solutions, and use the test results as the cascading threshold. We show that our model cascading strategy reduces computational costs while increases accuracy compared to generating the output with a single model. We also introduce a heuristics to determine the optimal combination of the number of solutions, test cases, and test lines each model should generate, based on the budget. Compared to speculative decoding, our method works on black-box models, having the same level of cost-accuracy trade-off, yet providing much more choices based on the server's budget. Ours is the first work to optimize cost-accuracy trade-off for LLM code generation with model cascading.


A Unified Model for Spatio-Temporal Prediction Queries with Arbitrary Modifiable Areal Units

Chen, Liyue, Fang, Jiangyi, Liu, Tengfei, Cao, Shaosheng, Wang, Leye

arXiv.org Artificial Intelligence

Spatio-Temporal (ST) prediction is crucial for making informed decisions in urban location-based applications like ride-sharing. However, existing ST models often require region partition as a prerequisite, resulting in two main pitfalls. Firstly, location-based services necessitate ad-hoc regions for various purposes, requiring multiple ST models with varying scales and zones, which can be costly to support. Secondly, different ST models may produce conflicting outputs, resulting in confusing predictions. In this paper, we propose One4All-ST, a framework that can conduct ST prediction for arbitrary modifiable areal units using only one model. To reduce the cost of getting multi-scale predictions, we design an ST network with hierarchical spatial modeling and scale normalization modules to efficiently and equally learn multi-scale representations. To address prediction inconsistencies across scales, we propose a dynamic programming scheme to solve the formulated optimal combination problem, minimizing predicted error through theoretical analysis. Besides, we suggest using an extended quad-tree to index the optimal combinations for quick response to arbitrary modifiable areal units in practical online scenarios. Extensive experiments on two real-world datasets verify the efficiency and effectiveness of One4All-ST in ST prediction for arbitrary modifiable areal units. The source codes and data of this work are available at https://github.com/uctb/One4All-ST.